CUM: An Efficient Framework for Mining Concept Units
نویسنده
چکیده
Web is the most important repository of different kinds of media such as text, sound, video, images etc. Web mining is the process of applying data mining techniques to automatically discover knowledge from such a diverse, sheer size data so that it can be more easily browsed, organized, and catalogued with minimal human intervention. A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. A large portion of web search activities aims to locate a set of concept entities relevant to the user query. The web unit mining problem is proposed to discover the concept entities and classify these concept entities into categories. Web page classification mainly assigns one or more concept labels to every web page based on its own content without considering other neighbouring web pages. The existing iterative Web Unit Mining (iWUM) algorithms create more than one web unit (incomplete web units) from a single concept entity. In this paper, we propose a novel non-iterative web unit mining algorithm, Concept Unit Mining (CUM), which finds the set of web pages forming each web unit, and assigns the web unit a concept label based on the structure of the web pages so as reduce the later classification errors. Our experiments using the WebKB dataset show that the disadvantage of iWUM algorithms are removed and over all accuracy is significantly improved.
منابع مشابه
Efficient Income Redistribution for a Small Country Using Optimal Combined Instruments
In this paper I improve Gardner's surplus transformation curve framework by assuming that government is able to vary many policy instruments simultaneously instead of only one. I use my framework to find the combination of the currently used instruments which provides the most efficient income redistribution for the Austrian bread grains market. Contrasting the most efficient policy to the actu...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملData Mining with Distributed Agents in E-Commerce Applications
In this paper we describe the prototype of a yellow page service for customers in a distributed cyber-shopping mall. This application combines distributed data mining with agent technologies. The paper focuses on a framework to support distributed data mining. Data mining approaches have dealt with finding interesting patterns, however, there is little research on developing a framework for eff...
متن کاملRanking Efficient Decision Making Units Using Cooperative Game Theory Based on SBM Input-Oriented Model and Nucleolus Value
In evaluating the efficiency of decision making units (DMUs) by Data Envelopment Analysis (DEA) models, may be more than one DMU has an efficiency score equal to one. Since ranking of efficient DMUs is essential for decision makers, therefore, methods and models for this purpose are presented. One of ranking methods of efficient DMUs is cooperative game theory. In this study, Lee and Lozano mod...
متن کاملIntegrating Classification and Association Rule Mining: A Concept Lattice Framework
Concept lattice is an efficient tool for data analysis. In this paper we show how classification and association rule mining can be unified under concept lattice framework. We present a fast algorithm to extract association and classification rules from concept lattice.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008